skip to main content


Search for: All records

Creators/Authors contains: "Shamma, Shihab"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available August 20, 2024
  2. Free, publicly-accessible full text available August 20, 2024
  3. Speech recognition in noisy environments can be challenging and requires listeners to accurately segregate a target speaker from irrelevant background noise. Stochastic figure-ground (SFG) tasks in which temporally coherent inharmonic pure-tones must be identified from a background have been used to probe the non-linguistic auditory stream segregation processes important for speech-in-noise processing. However, little is known about the relationship between performance on SFG tasks and speech-in-noise tasks nor the individual differences that may modulate such relationships. In this study, 37 younger normal-hearing adults performed an SFG task with target figure chords consisting of four, six, eight, or ten temporally coherent tones amongst a background of randomly varying tones. Stimuli were designed to be spectrally and temporally flat. An increased number of temporally coherent tones resulted in higher accuracy and faster reaction times (RTs). For ten target tones, faster RTs were associated with better scores on the Quick Speech-in-Noise task. Individual differences in working memory capacity and self-reported musicianship further modulated these relationships. Overall, results demonstrate that the SFG task could serve as an assessment of auditory stream segregation accuracy and RT that is sensitive to individual differences in cognitive and auditory abilities, even among younger normal-hearing adults.

     
    more » « less
  4. Experiments to understand the sensorimotor neural interactions in the human cortical speech system support the existence of a bidirectional flow of interactions between the auditory and motor regions. Their key function is to enable the brain to ‘learn’ how to control the vocal tract for speech production. This idea is the impetus for the recently proposed "MirrorNet", a constrained autoencoder architecture. In this paper, the MirrorNet is applied to learn, in an unsupervised manner, the controls of a specific audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. The results demonstrate how the MirrorNet discovers the synthesizer parameters to generate the melodies that closely resemble the original and those of unseen melodies, and even determine the best set parameters to approximate renditions of complex piano melodies generated by a different synthesizer. This generalizability of the MirrorNet illustrates its potential to discover from sensory data the controls of arbitrary motor-plants. 
    more » « less
  5. Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net [1],[2]. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered. We find that performance deteriorates significantly if one source is even slightly harmonically jittered, e.g., an imperceptible 3% harmonic jitter degrades performance of Conv-TasNet from 15.4 dB to 0.70 dB. Training the model on inharmonic speech does not remedy this sensitivity, instead resulting in worse performance on natural speech mixtures, making inharmonicity a powerful adversarial factor in DNN models. Furthermore, additional analyses reveal that DNN algorithms deviate markedly from biologically inspired algorithms [3] that rely primarily on timing cues and not harmonicity to segregate speech. 
    more » « less
  6. Little is known about how neural representations of natural sounds differ across species. For example, speech and music play a unique role in human hearing, yet it is unclear how auditory representations of speech and music differ between humans and other animals. Using functional ultrasound imaging, we measured responses in ferrets to a set of natural and spectrotemporally matched synthetic sounds previously tested in humans. Ferrets showed similar lower-level frequency and modulation tuning to that observed in humans. But while humans showed substantially larger responses to natural vs. synthetic speech and music in non-primary regions, ferret responses to natural and synthetic sounds were closely matched throughout primary and non-primary auditory cortex, even when tested with ferret vocalizations. This finding reveals that auditory representations in humans and ferrets diverge sharply at late stages of cortical processing, potentially driven by higher-order processing demands in speech and music. 
    more » « less
  7. Abstract Numerous studies have suggested that the perception of a target sound stream (or source) can only be segregated from a complex acoustic background mixture if the acoustic features underlying its perceptual attributes (e.g., pitch, location, and timbre) induce temporally modulated responses that are mutually correlated (or coherent), and that are uncorrelated (incoherent) from those of other sources in the mixture. This “temporal coherence” hypothesis asserts that attentive listening to one acoustic feature of a target enhances brain responses to that feature but would also concomitantly (1) induce mutually excitatory influences with other coherently responding neurons, thus enhancing (or binding) them all as they respond to the attended source; by contrast, (2) suppressive interactions are hypothesized to build up among neurons driven by temporally incoherent sound features, thus relatively reducing their activity. In this study, we report on EEG measurements in human subjects engaged in various sound segregation tasks that demonstrate rapid binding among the temporally coherent features of the attended source regardless of their identity (pure tone components, tone complexes, or noise), harmonic relationship, or frequency separation, thus confirming the key role temporal coherence plays in the analysis and organization of auditory scenes. 
    more » « less
  8. Musical imagery is the voluntary internal hearing of music in the mind without the need for physical action or external stimulation. Numerous studies have already revealed brain areas activated during imagery. However, it remains unclear to what extent imagined music responses preserve the detailed temporal dynamics of the acoustic stimulus envelope and, crucially, whether melodic expectations play any role in modulating responses to imagined music, as they prominently do during listening. These modulations are important as they reflect aspects of the human musical experience, such as its acquisition, engagement, and enjoyment. This study explored the nature of these modulations in imagined music based on EEG recordings from 21 professional musicians (6 females and 15 males). Regression analyses were conducted to demonstrate that imagined neural signals can be predicted accurately, similarly to the listening task, and were sufficiently robust to allow for accurate identification of the imagined musical piece from the EEG. In doing so, our results indicate that imagery and listening tasks elicited an overlapping but distinctive topography of neural responses to sound acoustics, which is in line with previous fMRI literature. Melodic expectation, however, evoked very similar frontal spatial activation in both conditions, suggesting that they are supported by the same underlying mechanisms. Finally, neural responses induced by imagery exhibited a specific transformation from the listening condition, which primarily included a relative delay and a polarity inversion of the response. This transformation demonstrates the top-down predictive nature of the expectation mechanisms arising during both listening and imagery. 
    more » « less
  9. During music listening, humans routinely acquire the regularities of the acoustic sequences and use them to anticipate and interpret the ongoing melody. Specifically, in line with this predictive framework, it is thought that brain responses during such listening reflect a comparison between the bottom-up sensory responses and top-down prediction signals generated by an internal model that embodies the music exposure and expectations of the listener. To attain a clear view of these predictive responses, previous work has eliminated the sensory inputs by inserting artificial silences (or sound omissions) that leave behind only the corresponding predictions of the thwarted expectations. Here, we demonstrate a new alternate approach in which we decode the predictive electroencephalography (EEG) responses to the silent intervals that are naturally interspersed within the music. We did this as participants (experiment 1, 20 participants, 10 female; experiment 2, 21 participants, 6 female) listened or imagined Bach piano melodies. Prediction signals were quantified and assessed via a computational model of the melodic structure of the music and were shown to exhibit the same response characteristics when measured during listening or imagining. These include an inverted polarity for both silence and imagined responses relative to listening, as well as response magnitude modulations that precisely reflect the expectations of notes and silences in both listening and imagery conditions. These findings therefore provide a unifying view that links results from many previous paradigms, including omission reactions and the expectation modulation of sensory responses, all in the context of naturalistic music listening. 
    more » « less